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Grouping is a statistical procedura through which 
memhars of the same group are considered as a single unit of 
ohservation. There are various vajs to aeaign group membership and 
various vajs to assign values of variables to groups « Share are 
methodological problems associated Mith grouping in general and with 
particular methods of grouping. Shis paper argues that a wide variety 
of complex analytical problems concerning inferences from grouped 
observations can be understood from the use of a few simple 
principles. She paper focuses on multiple regression models which use 
grouping and siowK that the effects of grouping depend centraily on 
the guality of the specification of the regression model used. 
Simulated examples as well as examples fram the literature are 
presented and discussed. Xhe principles developed are then extendf d 
to more complaji cases* In particular^ estimation from grouped 
observations in systems of simultaneous eguations and in dynamic 
models for panel analysis are examined, (Author/JKSJ 
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The difficultiea iavolved in making inferences across uaits of 
analysis have been disQuss^d in every doclml fielence (Hannasijt 1971) • 
Despite the considerable sathodologlcal lite r at ura that has davelopedi 
researeh practice sppaars substantially unchanged. Research on 
educational organisation and the eoniequenceB of education is by 
no means an exception (Bursteins 1975). So It: may be useful to eontlnue 
to restate the difficulties lavolved in making Infarencea across analysis, 
We focus on those aspects of the general problem that arise lit estimation 
from grouped observations* 

The point of the paper Is to argue that a wide variety of complcK 
analytic problems concerning Inferences from grouped observatlone can be 
imderetood by use of a few simple principles. To make this argument^ we 
restate the available results In simple terms. Tlie thrust of the earlier 
work Is to show that the effects of grouping depend centrally on the quality 
of mlidel specification. To reinforce this perspective t we present the 
results of a Monte Carlo simulation and analysis of empirical data* Then 
we show that the simple principles can be eKtended In a ^tralghtfoward 
imnn^'ir to the analysis of more cdmpleK cases thra. have been addre jsad In 
the existing lltcerature. We treat two cases i estimation frota grouped 
observations in systems of simultaneous equations and In dytiaJDic models for 
panel analysis, . . 
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II. ^sulta From the TWo Vsriable Regreaslon Hod©l 

We will first briefly restate the well known results for the 
two vsriable regresilon ciodali 

Y « + 3X u (1) 

where the distMbaaGej Uj* has mean zero, constant variance and is 
asymptotteally uncorrelated ^ith the regresiaar X* Tha least squares 
eetlmator 

£ <X. . ^ X. ) Y,^ 



is a coaslBteat estliaator foe 0 of (1). 

Considet the grouped regtasalon iorv^ 

V a + ex + u ' 

and theXeast squares estloacox' 

b « S (X . - X) Y , 




(2) 



The effects of grouping of obsiirvations are to be evaluated by comparing 
properties of the two least squares estimators, b and b* To make such 
compariaonB we need to further specify the nature of the grouping process* 

Any method of grouping oboarvations that retains the absence of corralation 
of the regreisor and the disturbancei i*e, be^een X and Uj^wlll yield 
consistent estiination of $* 

Three caseri deserve mentions randoia grouping^ grouping that fflaximigea 
variation in X {grouping "by X") and grouping that naximi^es varif Cion. 
in Y (grouping "by Y")« It is widely noted (Prals and Altchiaonj 1954; 
Blalock^ 1964| Hannan, 1971; Feige and Watte, 1972) that both ran4bm^^ 
grouping and grouping by X will preserve the lack of correlation between 
X and u and as a result for both' methods^ plim b ^ plim b » 0. 

Blalock (1964) was apparantly the first analyst to point out that 
grouping by Y will tend to produce a corralation between X and u even when 
X and u are uncorrelated (in the sample)* 

'* - 

^'Throughout wo assume eqtm uizM groupa^ior n treatment of efficient 
itstimation with unequal alzed groups, eee Praia and Aitchison (1954)* 



■• - ■ iff 

-2- 

^hia Eaans that leset squares applied to <2) will be inconsistent in this 
case (plim .b ^ b ^ &), . 

Hie results on efficiency of least squares estimators applied to (2) 
are also w^ell knora (Cramar* 1964; Hannan and Burstein, 19731 Feige and 
Watte I 1972)* Random grouping Gliminatea systematic as wall ae error 
variation indiscriminately and is as a result considerably more damaging 
to efficifincy of L.S, than is gtouping by X, In faeti grouping by X is 
optional in the sense that no other grouping method can yield a grouped 
L*S, estimator with smaller variance * 
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III, Grouping and Specification Bias 

Published (and unpublished) results on grouped and ungrouped data 
typically show considerable divergence* In many instances this is the case 
even when it seems unlikely that the data wera grouped systematically by 
values of the dependent variable* Soma more general phenomenon saema to 
be Involved. We argued earlier (Hannan, 1971 | Hannan and Bursteln, 1973) 
that grouping may give rise to systematie bias when the specification 
of the ungrouped model is faulty. In fact^ we argue, grouping may tend to 
magnify errors of specification. 

To show this, we consider a simple eKtension of the model considered 
earlier: 

Y ^ a + B^Kj^ + + " (3) 

where we assime that the discurbMce is asyroptatically uncorrelated with 
each regressor, and Suppose that a researcher fails to include 

and instead estimates a model 

Y ^ a e^Xj^ + w, (4) 

Following Theil (19S7) we flndi 

plim b^ - 0^ + 6^ bjj^ (5) 

where b2^ is the sample regression of and X^^, (the coefficient of what 

Theil calls the auxiliary regression). As long as the two regressora are 
correlated, least squares applied to (4) will give inconsistent estimates 
of 6, in (3). The ma&iitude of the discrepancy 

is called the Bpeciflcation bias of the ectiraator (as an eetlnmtor of 0^^) , 
The grouped analogues to (3) and (4) ar« 

Y ^^a 4 + + V ' (6)' 

and 

Y ^ a' + B^X^ + w (7) 

7 



The proparties of le&sc Bquares applied to (6) can be imdorstOGd 

...... ^ . * . • ■ 

by using results of the previous section. . It is still the case that 
random grouping preserver uabiaiedness of least squsrea but saerificee 
efficiency t grouping by t biaoes least squares^ and grouping by one or 

^"S^ - ■ . ; ■ 

more independent variable pre^er^ee tmbiasedness and fs more efficient 

2 

than rffladom grouping, {Thm results*o£ the simulation presented in Section 
III are relevant to this discusalon) ^ 

Least squarei applied to C7)i yields an estimator b£ such that 

plim b£ ^ + $2 EgjL \ i (8) 

As before^ % X^^^^^^orr elated, least squares applied to (7) gives 

uneonsiatent eacimation of B^* ' ; . i^ 

' ''^^ «^ -^^ : 

Hext* contrast the wo (inconsistent) e'stlroacorL^ b^'and bT We see 

pllffi (b^- b-) - pllm CejCb^j - b^j^]), (9) 
And note that the grouped and tmgrou^ed results will diverge if b^^ 
differs from b^j^* But both and bjj^ ^re coef f loiantB froa two vari^- 
able regressions so we can use resulta from the^ previous section. With . 
both random grouping and grouping by Xj* pllm bg^^ ^ pllm bgj^* As a 
resultt pliiB Cb**^b^) ^ 0 and on the average one will o^ain the same 
(incorrect) resulta frbm the grouped and the ungrouped. regressions* 



Xhase results can be shoim most easily by adopting the grouping matrlK 
^ trausf orMtlpn) 1^ (1954X1^^. .J3^.^fto_^ 

Felge and Watts (1972) ^e key Is 'to analyse, whether or not the trans-- 
formation Is systematically related %o the dlsturbancev For eKamplet 
when the Indopendent variables are Uncorrela ted with the disturbance , 
grouping that TOaKimi^ea variation in one or iQore of them will be unrelated 
to the disturbancep ?^ > ^ ' / 

%e will not contine to repeat that grouping by Y will always lead to 
Inconsistent estimation. * : . , 
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Qulte the contrary la the case for grouping by the omitted variable^ X^. 

For when observations ar a grouped systenLatlcally by X^s this is grouping 

by the dependent variable from the perspectiva of the auxiliary regression. 

We notad in the previous section, grouping by the dependent variable 

leads to inconsistent estimationp l,e# plia b£ # plim b£* ^ a result, 

the grouped and ungrouped least squares estiMtota differ in the limit 

and one will tend to draw different inferences from each. We conclude, 

then, that grouping by an omitted variable has consequences quite different 

from random grouping or grouping by an independent variable* 

It is important to Imow whether there is a possibility that the 

grouped estimator would be closer (asymptotically) to than is the 

ungrouped estimator. For such an "aggregation gain" to occur, 1^21 I ^^^21^ 

(compare (5) and (S)), None of the cases we considered yield this result, 

Haonan and Burstein (1973) ahow that, when observations are grouped by 

rules that relate additively to variables in the model, grouping will 

magnify any specification bias in the ungrouped escimator. Nevertheless , 

the possibility exists that the data are grouped by seme non-^additive 

combination of variables (e#g* place into the sMe group thoae observa^ 

tions that are highest on X but lowest on X«) " see the dlscussiou la 

1 2 

Hanushek et al* (1974) While such outcomes seem highly unlikely in 
"natural" groupingi, we cannot conclude that grouping always magnifies 

specification bias* . . . . . . . 

It may seem unusual to compare the properties of alternative estlma'- 
toffs of misspecified models* When we consider properties of estimators 
we routinely assume proper model specif Icatlon. However^ as researchers 

9 



we acknowledga the practical dlfficultlee involvGd in arriving at the 
CL^tmct specifieation and treat substantive models as partial ^lad tenta- 
tive. In a eense, then, we commonly admit to likely inlsspecifieation 
of our models, in this light it seems Important to realise that spocifi- 
cation errors that may be small and not very damaging to inference In 
an ungrouped analysis may be imgnlfled by grouping Into very sizable 
errors t ^ 

In the next two sections we illustrate* first with a Monte Carlo 
stoulation and then with malyeis of real data, the considerable damage 
to Inference that ig poisible. 
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IV . A tonte Carlo Siaulatlon 

To Illustrate each of the pointa made in the proceeding aectiona 
designed the following* Monte Carlo Glmulation. We generated 100 
samples of sl^e 500 from a population characterised by 

Y ^ IX^ + IX^ + 6u 
where X^^ and X^ were both N(Oil)> and u was NCO^l) distributed IndepGnd- 

ently of and Xji and 6 was a constant set at two diffe^rent levels to 

2 2 
vary R for the equation. We varied R (at .3 and *7) and r^^^ (at .25, 

2 

,50* and .75)« For each of the six parameter combinations of R and r 

1^ 

we then grouped observations into 50 equal slEed groups in each sample 
generated by three methods i randomly^ by values of X^» and by values 
of X^/ 

We estimated (with ordinal^ least squares), each of the following 
regressions 

^ " ^ih ^ ^ ^"^^ 
from ungrouped and grouped data (for each grouping method). 

Before treating the misspedlfied model, consider first the results 

for the correctly specified equation, (10) « As far ae we know> no ont 

has shown the consequenceo of grouping by one regressor in a multiple 

regression* As we see in Tables 1 and 3| groupihg by X^ or X^ yields 

a pattern of estimates that centers relatively close to the true values. 

These results further confirm our claim that, eKcept for grouping by 



>y X we ordered obsarvatigns in decreasing order by X^^ 
Lacea the first 10 observations into one group, the neHt 



III grouping bj 

values and placeS the first 10 observations into one group, the neSt 
10 in the ncHt group, etc* Then each of the grouped observations was 
replaced with the group mean* 
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groupinc affects consistency only by magnifying epecif ication bias. 
Wlien» as here, there Is no specification blas^ grouping obsnrvations 
by one of a set of regressors does not alter the average value of the 
least squares estimators. 

Earlier we noted that grouping by the regressor in a two variable 
regression is optimal in the sense that it minimises variance of the 
estimator (among the class of eonsistent grouped estimators)* In Table 
4 we see that mean squared error (bias equared plus variance) of estimates 
over the 100 samples is considerably lower for grouping by either of the 
regressors, as. contrasted with random grouping* Presumably grouping by 
both regressors simultaneously would further reduce the variance of the 
grouped estimator. 

The results on mean error for the miaspecifled equation, Table 2, 
closely fit' our eKpectatlons. Notice first the specification* bias in 
the ungrouped estimator. Since both and are N(Ojl), b^- ^ ^nn 
and plim b£ ^ 1 + ^^2^2 ^hat we are calling error is simply r^2* 
The data conforta closely to this result* Grouping by X^ gives mean errors 
almost Identical to thase for the ungroupcid case. As we suggested earlier^ 
groupljig by X^ In this case gives optimal estimates of the wrong term, 
^1 ^ ^2^21' Finally, grouping by the omitted variable, Xjt greatly magni-- 
flee the specification bias. The magnitude of thi Inflation varies from 
a two-thirds increase (for r^^ *^ ^^S) to a more than six--fold increase 
(for r^2 " -25) J 



^This pattern Is^not surprising since the lower ^^^2* greater the 

upward bias In b from grouping by X (as long as r >0), (cf. Blalock, 
1964). . 21 2 1^ 
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V, An Emplrleal llluatratlon 

In a recent publication, Bidwell and Kasaarda (1975) purport 

to dtmonstrate that organizational properties of school distinct (e.g. 

teacher/pupil ratio) affect student achievement by regragelng district 

mean achievement on school district properties, ^niey ar^ue strenuously 

against including (aggregated) properties of individuals in these analysti 

* . ,all maasured relatlorishlps are at the level o£ the school - 
district* None pertains to individuals, schools p or other sub--units 
of th^ district- The reader should keep in mind that we are riot 
analyzing antecedents of the academic achievement of individual , 
students, rather the overall effectiveness of a school district 
as measured by the aggregate achievement level for all its students 
at a given grade. Introducing multiple levels of analysis into the 
same model brings difficulties of estiMtion and interpretation ^ 
(e,g.. the "ecological fallacy") that we ^sh to avoid (Bidwell 
and Kassarda, 1975 i63). , 

We have no quarrel with such orgMLlzatlonal analysis* However i given the ^ 
eKtensive knowledge available concerning the determinants bf achievement 
at the individual level and the knoTO correlation of at least some ot f 
them, e*g. SES, \d.th school quality, we wonder if oMtting all but achoor 
district properties from the model gives unrealistlcally -large organl- 
Eational effects. In fact, we argue that omitting other (correlated) ^^^'^ 
causes of achievement together with estteatlon from the grouped obser- 
vatloni produces ©Kactly the type of fallacious inference Bidwell and 
Kassarda claim to avoid. ■ " 

To see this, suppose ttet school district character 
SES background of students both affect their academic achievement. A: ^ 



Note that this la not Bidwell and Kasaarda's model* But, if school ' v 
district characteristics do affect achievement, they- should appear in - 
this sort of model. In a sense, Bidwell and Kassarda did ■introduce .^^^^^^^^^^ 
some input variables (though they consider them to be environmental 
constrainCB)-*^iamely percent non^white and education of "the adult popu^ 
latloni. They report that adult educatiotT (proportion of the population 

-with at least four years of education had ai? insignificant effect "in the 
achievement regression (although the beta-weights reported are sizable). 
Perhaps, the failure to measure education of nrente accounts for the small 

/ education,. At -any. rate, . Bidwell: and Kassarda .then -CKcluded. such 



aasuolng that r^^ ^^^>0 and that w ia distributed^ Independently of the, 
two regressdre. ■ . ' , - . : : ' 



Hie aodels (i.e. relevant parts of the models) estimated Bldweil ^- '^''"^i' 

' • ' • .r- -v-" 

and Kassarda take the form: "! . Jb^'-- 

(where by definition sb «i SD). To evaliikte the properties of ar.Ithe*^" 

' - ' ' ■ ' ' 

least squares estimator of a£ we refer to the results, already pre6en£edi'X''.^'^i'^'; 
Elimination of SES fron the ungrouped eJua^lon^iSlSiie^^^^ 
Specification bias In least aquaraa aatimatloa of the 'school district 
effect. Hhmc about magnification of this bias from grouping? If^ as ^ ^''"^^^^'^^^^^^^^^ 
we suspect, todlvlduala are selected Into school dlatricti. on the bas^^^^^^ 
of their SES, the . observations in Bidweu' and Kassarda' s analysis were?"*->'''''^^^^ 
• (to some extent) grouped by the omitted variable. If -so, their reported- . . ^ ^. 

'.•,iV •■ ''K. -V*'"^ "' ' 

school effects would greatly overstate the case . (be upwardly biased). ° 

We cannot, of course, determine from Bidwell and Kassarda 's analysis ' 
how serious this problem is. We had access to data' on sixth graders in 
California schools and school dlstfclets that enable us to lUustrate the 
damage to inference. The outcome measure used is reading achievement 
(as in Bidwell and Kassarda 's«ialy sis). The structural variables 
available at the district level were resources (Expenditures/average 
daily attendance), and pupil teacher ratios.^ The input variables avail 
able were parents occupation (six-categories) and IQ estimates by teachers. 
Our strategy was to construct two regression models J one regresses 



7 . . . 
Following Bidwell and Kassarda we include percent non-white in the model 
as well. - - " 



achievement on only the school district structural properties (resources^ 
pupil/ teacher ratio) and parcent non«whltej the second adds the Inputs 
varlableB. Thmn we estimate (with ordinary least squares) each regression 
with pupil level Inputs and outputs* school mean Inputs and outputs, 
and district mean Inputs and Qutputa, This allows us to contrast specific 
tibn bias at each level. 

The raaulta are presented in Table 7* Consider first the improperly 
specified equation analogous to that used by Bldwell and Kaasarda (the 
three columns on the. lefit in Table 7)* ThB beta^eights associated with 
district structural properties are enormously inflated with grouping* 
The increase from the pupil to the school level is mors than 200% for 
each effect (from -*119 to -*465 and -,167 to --,368) and a second large 
increase from the school to district level (•-•465 to ***664, and --.368 
to -*737), Then^ consider the effect of introducing the input v 
^eir addition does not greatly alter the district property effects at the 
pupil level* ^ Butp the affects of grouping are greatly reduced by the 
improvemejit in the. specification, chmgB in the district structure 

ef facta is ye^ much smaller, going from pupil to school lavalp (from 
-*149 to -p196 and from -,146 to -,048), So these rasulta conform very 
closely to our e^qjeatatlons, !^ document the magnitude^ 

rSf the hazard to correct inference arising iE\ estimation from grouped 
data Mid poorly specified models. 

^The regression containing inpucs cannot be estimated at the district level 
due to small number of bbservations, . ■ ^ 

^Wa suspect that adding mora input yariablas would considerably reduce 
theaa affects, however* ! 
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VI EKtanslons - 

In thlfl section we Illustrate how the sitapla analytie results of 
the pMceadiag sections can ba used to handle mora co^leK nodels. We ^ 
consider the two types of e^c'tenaions most likely to ba of Intarest to ; 
substantive rasaarchersisimultaae dynamic models • ^ 

^ A, A System of Simultanabus Equations ^ v 
_ Causal sy stems in which there is reciprocal causation (a,g,^ aepirations 
affect achievementp and achievement affects aspiratioM) lead to complex V / 
inference problems* Let us confine the discussion to a simple case: 

^2 Ha + «22^2 + ^2 ^^^^^^^^ ' ; (^^^ 

where Wj^ and w^ are distinguiehed independently of X^^ and X^* It is easy 
to show (cf • Johnston^ 1971) that the endogenous variables on the right 
hand side (Y^ in (12), in .(13)) are correlated with the disturbances. 



Least squEres applied to either (12) or (13) will be inconsistent-^^ 
contain "simultaneous equations biasw" - . . 

Instead of using ordlna^ least squaresi we solve for the so'-callad^^^^^^ 
reduced*form: / 

■ - - ■ . ' } . ■ ^ ' ■ 

ordinary least squares applied to (14) and (15) is consistent^ r Having 
obt a toed -estimates of the reduced--form coefficlients, we c^.tn^ in this ' 
simple case, solve directly for the estimates of parameters of (12) --(13)* 
More generally, we calculate Y^ Y^ from estimated (14)-(1S) and substi- 

" ■.16 ■ 



tute them for ^ aad the RIIS of (12)«<13) and then apply ordinary 

least squares to the revised' oystM (12-13). ThB first method is 
called indirect laast squares » the secondp t^o^stage least flquaros. Both 
methods lead to, consistent est iMCion. - 
Next consider the grouped datas 

^1 ^ ^12^2 *11^, ^ % 

^2 ^ + «22^2 ^ ^ (17) 

whieh has as its reduced-form, 

V" + "22^2 + 52 (19) 

- If the data are grouped hy values of or X^p our earlier results 
Inply that estimation of the grouped reduaed-f orm will be asymptotically 
equivalent to estimation in the imgroupad reduced-form. Since estimates 
of the parameters of (16)- (17) are ftmatlsns of the reduced-form eetimates 
both indirect least squares and two-etage least squares will be asymptot- 
ically^ equivalent for both grouped and ungrouped estimators. - " 
Grouping by an endogenous variables Tf^ or Y^, is quite another 
matter t We have noted that grouping by a dependent variable leads to 
inconsistent estimation in that equation* So If we group by Y^* say, ' 
the first equation- In the redueed-form, (18), will be Ijiconslstently 
estimated, siice the estlBmtes of the coefficients of (16)-(17) are 
funetlons of all the reduced-form estlinates, grouping by an ehddgenous ^ 
variable will lead to inconsistent estimation throughout the structural 
form (16-^17).. ' 



Consldsr the following tf±mpla dyaaE^e £omulatlbh^^> applied ' to 



panel obsmrvaEions on N Individuals over Z waves of observation (e.g., 
a aohort of students observed once a year for several years),'. Assuming ^^J^L^^ 
t?iat the causal structure" is constant; we pool all NT observations Into 
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It is usually the ease that the disturbances in such models are .Wsltively'^^ ? 




.. -.. . - ..^ ..... ............... ., . . . .... ... . ...... . . . . .^rf'mt^-ipm 

to (20) lAll be inconsistent. To go more deeply into the problem we inust ^'m>^1M 



specify the nature of the process accounting for the autocorrelation, , .r^^r/^S'^^ 

Assume that the dlsturbmce has the foUoirtng variance^components fonn 'i^J^/n^ 

— , ' . " 

(cf* Herlove, 1971) ^ ( ^ . '1 

where y is a time invariant constant associated with each^unit (e.g* 5^"". 
each pupil's unobserved characteristics) md imcorreXated with X , , \ i ^ 

and v^^ is a well behaved randoia N(0,1) disturbance., "uncorrelated with . „ ? 

y. This model has proven useful in panel analyais (cfi : Nerlove5^1971| 
Hannan and Young, 1974), ^ ■ ' 

It is helpful to restate the model in matrix form to' highlight the 
parallel with tha foregoing. Let 



'''^Tha model is dynamic due to the presence of the lagged dependerit varlabl4 
—it is a stochastic dlffarence equation. - 



e pool observratlons to allow correction for the autocorrelatlDn problem . ' 
discusaed below. ^The problem of autocorrelation' remains whether or 
not one pools observations. / ^ : ; ' 
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where T . and Y * are WSl vectors 




where the ijj^'s are .the QQefflclsnta cerrespondlag to the dm variables 
contalhed Jji ^ *5 (S^^i S^r* • , where 6^ 1@: an Nxl- duseiy variable for 
the ith individual. . 

^ent the model (20-21) ean be writtenr ' ^ ^ ■ 

^ E , a' + A y + V , (22) 
Thm usual leaat squares procedure employs only , 

The o^ssion of the set. of dimmy variables* ji^ vleads .to the type of 

specification bias considered above. ^ _ ' , 

"12 

Consider next the grouped versions of ; (22) and (23) ; 

1 " I t^h- a + + / . (24)^^ 

and . ' . - 

" St.!^" + ^ . ■ " . y " (25) 

Obviously least aquaras applied to (25) , ^ mlsspacified model, will be 
inconsietent^ But will the results differ from those of the ungrouped 
case? As always f It depends on tae nature of the grouping process. 
Our previous results follow Isaaedlately from a generalisation of Thell's 

For a treatment of (24) in terms of grouping matrices ^ see Appendix A;. 



(1957) epeelfication analysis . If the data are grouped by Z , (either 

. * ...... ■ : L " : = . " ^ ' ' - 

X or Y) there Is no magnification of the specification bias from the, 
imgrouped estimator. If the data are grouf.ed by ^ (I.e. if the unob- 
served properties of the mits are related to the selection of the group) 
the bia s, from t he ungrot^ed^estiffiator— is magnified, 

Finally J note that least squares applied to (22) la a consistent 
estiiaator as is least squares , applied to (24) " we cal£^ these least 
squares with dunsay variables (LSDV)- Theses estimators are asymptotically 
equivalent to generallEed leas t squares v as tlma tors (^enlya » 196 7 ; s ee 
also Hannan and Young, 1974), / ' 



ConcluaiQn 

Cooparlsons of grouped and imgrouped estteators for models tha*: are 
incorrectly specified clarifies th© manner in which grouping affects in- 
fsrences from regression analysis* In particular i under qu ite general:^^.~ 
conditions when observations are grouped by seme criterion that mlKlmlzea 
variation in an omitted variable (correlated with included regressora) , 
grouping loagnlfles the specification bias of the ungrouped estimation ^ 
Both our Monte Carlo aimulatlon and substantive analyaia suggest that the 
magnification can be quite large* As a residt qualitative inferences from 
regressions with grouped data ray differ greatly from tho^e made from 
regressions with imgrouped data. *, 

The principles used, in the specif Icatlon bias analysis of the effects 
of grouping can be used to understand the effects of grouping in more com-- 
plex models. In particular > we have shown that grouping in a simultaneous 
equations madel and, In a dynamic model for panel analysis introduces no 
complicatlens that cannot be addressed with this mathodologys 



Eseimatlon wlth-Grouped Data In a Fool e dyCroBB^Seetlona' Tim Serlea Model. , 
r^^r effect of agsregation in the context of the/poqllng^ 6 

say be conveniently analyaed vlth the notations introdueeYi by Prale ^ Altchl- 
son (1954). The oo del we shall examine Is 'the' one def Ined; |.n = Chapter VI, B. 
In natrix notations . — '. . ' • , . 

■ - ^ .....^ . .-^ ^ ^ -^^.-.^.^^^^^^ _ _ 

where A - (6^> fij***'** H \ 

If the data are arranged so that thk N rowa correi^onding to Individual 
observations at the first ttoe period are placed first j-^t^^ 
to the second time perio/next, etc. . . , th«e 4 ^ ^l^en^glicitly to 
matrix notation as; ' ' . " 



1^ is a T S 1 vector of I's. 

is the H X X identity iaatrl:ic • 

8 denotes the Kronecker product or direct suttt. • 
The aggregation rule which one is* ooat lifcely^to uM in "this context 
coniists of taking overages over the 4me individuals at each tiae period (for 
exainple, the academic performaftces of studehts measured; ov«r time are aggre- 
gated at the classroom level). In that case, an^if one assMes that tlie 
number of individual observationa to each group is the sane, the aggregation 
procedure tBay be represented by the linear transformationl 

M - I 0 1/tt' 1' • "' n 

m n 



applied to the data corresponding to each ttol period 



m is the ntffliber of groups 

n is the ntmber of IndividMais within a group • in*m * N) ^^'^ 
Since there are T ti^ periods i the cos^lete transforfflation becomes i 



G - 1^ 9 M 1^ 



ERIC 



The grouped model may now be^ written asi 
G,Y ^ G . . a + G A U + 

it is Iffipertant to inyestigate the efieet of aggregation on jthe error- . 
eomponents aspects of the pooled mo dalp particularly the eKtent to vhich the 
specif Ic methods developed by Harlove (1971) and Haanan a Young (1974) have 
to be modified. To do thiSp wa need to know the resulting value of G*A jii the 
aggregated individual^epeciflc error teros; 

Prom the above definitions, 

= f J 9 i/a. 1-)] ^ 

1 S 1/tt. 1" 

m n 

I i l/n. 1' . 

Hie last expression may be seen to be equivalent to @ I^) y, where 
U is the n K 1 vector contaiiU,ng the averages of . the U^'a corresponding to 
the individuals aggregated a f articular group* Notice tliat 1^ 8 is of 

the same form as eo that the aggregated model Mn be adequately represented, 
with m "duimles" corresponding to each group, and "coefficients" \x^» 5 ^ 1,.* 

m that consist of averages of the n individual error tema within a group. -It 
foUowa that the methods for dealtag with pooled models (LSDV or GLS) may be 
applied without change to data aggregated in this mnner. \ 



Table 1 

Mean Errors of Estimate 
for Y s + g^Xj + u 

(correct Ispaclflcatlon) 



"12 

Urigroupad 

Random 
grouping ' 

Grouping 
by X- 



Grouping 
by 



"12 

Ungrouped 

Raidoa 
grouplna 

Group Infi; 
by. X, 



.25 



1 

-.004 



"2 
.007 



-.004 -.001 



.013 



.002 



.25 



••009 



-2 
.017 



%010 -.001 
.001 -.015 



.50 



"1_ 
-.006 

-.005 



.001 -.006 , -.018 



,002 
.2 



"2 

.009 

.001 
.033 

.006 



R ° .3 



.50 



1 
.013 

.012 

,043 



"2 
.021 

.003 

.082- 



.75 



"1 - 
-.010 



2 

.013 



-.008"- ;0a4 



.034 -,045 



-.022 



.022 



.75 



"1 
-.022 



2 

.030 



-.018 .010 
.079 -.105 



by X, 



.031 



,005 



-.006 



.014 



-»052 



.050 
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Table 2 

Mean E rgora o£ Estimate 



for Y " h^X^ + V 
^klBSpecified model) 



^1 ^1 



UnRreuped .246 .498 .751 

Randgm ' . 

gcouplng .249 .502 .755 
Grouping 

by .246 .498 .751 

Grouping • 

by Xg 1.667 1.571 1.253 

^ .3 

^12 ^ <^ aJO .75 



Ungrouped .243 .496 .751 

Random .■ 

srouplng .244 .498 .752 

Grguplng 

hy .244 .497 .751 
Grouping 

by Xj 1.691 1.583 1.261 
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• , Table 3 

Pgopogelon of Positively Biased EstlmateB 
for T « $^ + ^2^2 - 
(correct specification) 



.25 



R* ^ .7 

.50 



.75 



Ungrouged 

Random 
grouping 

GrouplnR 
Grouping 

by 



.44 
.49 
.52 

.58. 



.56 
.48 
.45 

.48 



•45 
.47 
.41 

.53 



2 



.59 
.48 

.55 

.57 



.46 

.47. 
.60 

.46 



.56 
.49 
,42 

.55 



'12 



,25 



.50 



. 75 



Ungrouped 

Random 
grouping 

Grouplnf^ 
by \ 

Grouping 
by X, 



-1 
.44 

.49 

•52 

.58 



"2 
.56 

.48 

.45 

.48 



"1- 
.45 

.47 

.41 

.53 



-2 
,59 

.48 

.55 

.57 



"1 
.46 

.47 

.60 

.46 



"2 
.56 

.49 

.42 

.55 
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' . Table 4 ; 
Propogtloti of i^ositlvely Biaaed 'Estlmatea 
£ or Y ^^]L% + V 

Cfflis specif led model) 



^12° 


OS" 




75 






V 




H 


■- . . ■" ■ " i- ;fif 


Ungifouped 


1.00 


1.00 


1.00 


' - • . ^ 


fVoIltJlMUl 


.90 


1.00 


1.00 




Grouping 
by 


1.00 


1.00 


1.00 




Grouping . 
by Xg 




« .3 








.25 


: .50 


,75: 




Ungrouped 


.99 


1.00 


1.00 




Random 
^roupins 


.70 


'.89 


,96 




Grouping 

by 


"".99 


1.00 


1.00 


... - -■ 


Grouping 


1.00 


1.00 


1.00 * 





Mf-]:"' 
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Table 5 „ 

Mean Squared ErrorB of Estlniate 



for ¥ ^ 6^% + BgXj + u 



erJc 



'12 



(correct specif icatlon) 



- . 7 



.25 



.50 



^75 
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H 














Ungr ouped 


An** 


« uy^ 




003 


007 






Random 
grouping 


• U^/ 


• 




O^Q 




',081 


■ ■ 


Grouping 
oy 1 


*003 






.037 


.034 


,057 




Grouping 






.3 


.013 


.078 


.043 


- . , ' . , . " 

■ " ■ "'- :^ 




H 


.25 


.50 




.75 

h 


^2 




Ungroupad 


.012 


.011 


.018 


.018 


.037 


.038 




Ranuom 
grouping 


.147 


.135 


.236 


.210 


.492 


.443 




Grouping 
by Xj^^ 


.016 


.102 


.063 


.199 


. 3 36 . 


.311 


' ' V , ' ■ ■ 


Grouping 


.147 


.019 


.224 


.071 


.427 


.234 


\ - ■ - "'. i'^ 



^.3 



'^12- 



Ungrouped 

Random 
grouping 

Grouping 
by X, 



Grouplna 
by ' 



^12 " 



Ungreupsd 

grouping 
Grouping 

Grouping 
by X. 



Table 6 • 

Mean Square ErrogB of EBtlnatg 
for "y » h^X^ + V 

CnlBSpeclfled model) 
„2 



.25 

.064 
. .106 
.064 

2.842 
.25 

.071 
.203 
.072 

2.978 



.7 



.50 

.252 
.295 
,252 

2.484 

.3 



.50 

.260 
.408 
.261 

i ^ ■ ' 

I. 

2.549 



♦ 75 

.568 

.611 
.568 

1.577 
.75 

.579 

.746 
.579 

1.618 
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Coiparison ef Mtirnitivi Spicifieitiens of g Eiidiai AcW 










Sigresiisa*. Unirsupid and Grsupad Bata (biti-vsighti 
Lsvil af tealziis 


0 




Indipeadist %t 


.ables 








tMBmtm 




-.119 -M -.664; , 






f upil/Tiichir h 


itio 


-.167 ",368 -;737 ' 


-.146 : 


-.048^ 






-m ' , ",007 :^ 


-.109 ^ 


: -.198 


PiEinti' Occupit 


ion (1) 




.03? 


.143-; 


Paints' Ofifiupit 


ion (2) 




''^^■,';;:.ia6; 


-.072 y 


firwts' Oceupat 


ion (3) 




,160 ■ 


; .255 


liriats' Oecupii 


iaa (4) 




: .202: r 


■■y;';l64- 


Pireats'.Oceupit 


ion (J) 


ip ■ ■ ■ . 


;;243 


.051 


I.Q, Estliate , 






:385 


;oai 
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